20 research outputs found
Weighted Distance-Based Models for Ranking Data Using the R Package rankdist
rankdist is a recently developed R package which implements various distance-based ranking models. These models capture the occurring probability of rankings based on the distances between them. The package provides a framework for fitting and evaluating finite mixture of distance-based models. This paper also presents a new probability model for ranking data based on a new notion of weighted Kendall distance. The new model is flexible and more interpretable than the existing models. We show that the new model has an analytic form of the probability mass function and the maximum likelihood estimates of the model parameters can be obtained efficiently even for ranking involving a large number of objects
TRIAGE: Characterizing and auditing training data for improved regression
Data quality is crucial for robust machine learning algorithms, with the
recent interest in data-centric AI emphasizing the importance of training data
characterization. However, current data characterization methods are largely
focused on classification settings, with regression settings largely
understudied. To address this, we introduce TRIAGE, a novel data
characterization framework tailored to regression tasks and compatible with a
broad class of regressors. TRIAGE utilizes conformal predictive distributions
to provide a model-agnostic scoring method, the TRIAGE score. We operationalize
the score to analyze individual samples' training dynamics and characterize
samples as under-, over-, or well-estimated by the model. We show that TRIAGE's
characterization is consistent and highlight its utility to improve performance
via data sculpting/filtering, in multiple regression settings. Additionally,
beyond sample level, we show TRIAGE enables new approaches to dataset selection
and feature acquisition. Overall, TRIAGE highlights the value unlocked by data
characterization in real-world regression applicationsComment: Presented at NeurIPS 202
Recommended from our members
Between-centre differences for COVID-19 ICU mortality from early data in England.
Since the first cases in November 2019, the spread of SARS-CoV-2 infections has placed unprecedented strain on healthcare. The intensive care unit (ICU) is of particular concern as large numbers of patients with severe respiratory complications mean that in some areas, ICUs have been completely overwhelmed [1]
Neural Laplace Control for Continuous-time Delayed Systems
Many real-world offline reinforcement learning (RL) problems involve
continuous-time environments with delays. Such environments are characterized
by two distinctive features: firstly, the state x(t) is observed at irregular
time intervals, and secondly, the current action a(t) only affects the future
state x(t + g) with an unknown delay g > 0. A prime example of such an
environment is satellite control where the communication link between earth and
a satellite causes irregular observations and delays. Existing offline RL
algorithms have achieved success in environments with irregularly observed
states in time or known delays. However, environments involving both irregular
observations in time and unknown delays remains an open and challenging
problem. To this end, we propose Neural Laplace Control, a continuous-time
model-based offline RL method that combines a Neural Laplace dynamics model
with a model predictive control (MPC) planner--and is able to learn from an
offline dataset sampled with irregular time intervals from an environment that
has a inherent unknown constant delay. We show experimentally on
continuous-time delayed environments it is able to achieve near expert policy
performance.Comment: Proceedings of the 26th International Conference on Artificial
Intelligence and Statistics (AISTATS) 2023, Valencia, Spain. PMLR: Volume
206. Copyright 2023 by the author(s
Retrospective cohort study of admission timing and mortality following COVID-19 infection in England.
OBJECTIVES: We investigated whether the timing of hospital admission is associated with the risk of mortality for patients with COVID-19 in England, and the factors associated with a longer interval between symptom onset and hospital admission. DESIGN: Retrospective observational cohort study of data collected by the COVID-19 Hospitalisation in England Surveillance System (CHESS). Data were analysed using multivariate regression analysis. SETTING: Acute hospital trusts in England that submit data to CHESS routinely. PARTICIPANTS: Of 14 150 patients included in CHESS until 13 May 2020, 401 lacked a confirmed diagnosis of COVID-19 and 7666 lacked a recorded date of symptom onset. This left 6083 individuals, of whom 15 were excluded because the time between symptom onset and hospital admission exceeded 3 months. The study cohort therefore comprised 6068 unique individuals. MAIN OUTCOME MEASURES: All-cause mortality during the study period. RESULTS: Timing of hospital admission was an independent predictor of mortality following adjustment for age, sex, comorbidities, ethnicity and obesity. Each additional day between symptom onset and hospital admission was associated with a 1% increase in mortality risk (HR 1.01; p<0.005). Healthcare workers were most likely to have an increased interval between symptom onset and hospital admission, as were people from Black, Asian and minority ethnic (BAME) backgrounds, and patients with obesity. CONCLUSION: The timing of hospital admission is associated with mortality in patients with COVID-19. Healthcare workers and individuals from a BAME background are at greater risk of later admission, which may contribute to reports of poorer outcomes in these groups. Strategies to identify and admit patients with high-risk and those showing signs of deterioration in a timely way may reduce the consequent mortality from COVID-19, and should be explored
Clairvoyance: A Pipeline Toolkit for Medical Time Series
Time-series learning is the bread and butter of data-driven *clinical
decision support*, and the recent explosion in ML research has demonstrated
great potential in various healthcare settings. At the same time, medical
time-series problems in the wild are challenging due to their highly
*composite* nature: They entail design choices and interactions among
components that preprocess data, impute missing values, select features, issue
predictions, estimate uncertainty, and interpret models. Despite exponential
growth in electronic patient data, there is a remarkable gap between the
potential and realized utilization of ML for clinical research and decision
support. In particular, orchestrating a real-world project lifecycle poses
challenges in engineering (i.e. hard to build), evaluation (i.e. hard to
assess), and efficiency (i.e. hard to optimize). Designed to address these
issues simultaneously, Clairvoyance proposes a unified, end-to-end,
autoML-friendly pipeline that serves as a (i) software toolkit, (ii) empirical
standard, and (iii) interface for optimization. Our ultimate goal lies in
facilitating transparent and reproducible experimentation with complex
inference workflows, providing integrated pathways for (1) personalized
prediction, (2) treatment-effect estimation, and (3) information acquisition.
Through illustrative examples on real-world data in outpatient, general wards,
and intensive-care settings, we illustrate the applicability of the pipeline
paradigm on core tasks in the healthcare journey. To the best of our knowledge,
Clairvoyance is the first to demonstrate viability of a comprehensive and
automatable pipeline for clinical time-series ML